%matplotlib notebook import numpy as np import matplotlib.pyplot as plt

Linear models for regression

y_pred = x_test[0] * coef_[0] + ... + x_test[n_features-1] * coef_[n_features-1] + intercept_

In [4]:
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

X, y, true_coefficient = make_regression(n_samples=80, n_features=30, n_informative=10, noise=100, coef=True, random_state=5)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=5)
print(X_train.shape)
print(y_train.shape)


(60, 30)
(60,)

Linear Regression

$$ \text{min}_{w, b} \sum_i || w^\mathsf{T}x_i + b - y_i||^2 $$

In [5]:
from sklearn.linear_model import LinearRegression
linear_regression = LinearRegression().fit(X_train, y_train)
print("R^2 on training set: %f" % linear_regression.score(X_train, y_train))
print("R^2 on test set: %f" % linear_regression.score(X_test, y_test))


R^2 on training set: 0.877842
R^2 on test set: 0.492723

In [4]:
from sklearn.metrics import r2_score
print(r2_score(np.dot(X, true_coefficient), y))


0.729977472078

In [11]:
plt.figure(figsize=(10, 5))
coefficient_sorting = np.argsort(true_coefficient)[::-1]
plt.plot(true_coefficient[coefficient_sorting], "o", label="true")
plt.plot(linear_regression.coef_[coefficient_sorting], "o", label="linear regression")

plt.legend()


Out[11]:
<matplotlib.legend.Legend at 0x7f1edc05a978>

Ridge Regression (L2 penalty)

$$ \text{min}_{w,b} \sum_i || w^\mathsf{T}x_i + b - y_i||^2 + \alpha ||w||_2^2$$


In [12]:
from sklearn.linear_model import Ridge
ridge_models = {}
training_scores = []
test_scores = []

for alpha in [100, 10, 1, .01]:
    ridge = Ridge(alpha=alpha).fit(X_train, y_train)
    training_scores.append(ridge.score(X_train, y_train))
    test_scores.append(ridge.score(X_test, y_test))
    ridge_models[alpha] = ridge

plt.figure()
plt.plot(training_scores, label="training scores")
plt.plot(test_scores, label="test scores")
plt.xticks(range(4), [100, 10, 1, .01])
plt.legend(loc="best")


Out[12]:
<matplotlib.legend.Legend at 0x7f1ed406acf8>

In [13]:
plt.figure(figsize=(10, 5))
plt.plot(true_coefficient[coefficient_sorting], "o", label="true", c='b')

for i, alpha in enumerate([100, 10, 1, .01]):
    plt.plot(ridge_models[alpha].coef_[coefficient_sorting], "o", label="alpha = %.2f" % alpha, c=plt.cm.summer(i / 3.))
    
plt.legend(loc="best")


Out[13]:
<matplotlib.legend.Legend at 0x7f1ecf52d8d0>

Lasso (L1 penalty)

$$ \text{min}_{w, b} \sum_i || w^\mathsf{T}x_i + b - y_i||^2 + \alpha ||w||_1$$


In [14]:
from sklearn.linear_model import Lasso

lasso_models = {}
training_scores = []
test_scores = []

for alpha in [30, 10, 1, .01]:
    lasso = Lasso(alpha=alpha).fit(X_train, y_train)
    training_scores.append(lasso.score(X_train, y_train))
    test_scores.append(lasso.score(X_test, y_test))
    lasso_models[alpha] = lasso
plt.figure()
plt.plot(training_scores, label="training scores")
plt.plot(test_scores, label="test scores")
plt.xticks(range(4), [30, 10, 1, .01])
plt.legend(loc="best")


Out[14]:
<matplotlib.legend.Legend at 0x7f1ecf4b9860>

In [15]:
plt.figure(figsize=(10, 5))
plt.plot(true_coefficient[coefficient_sorting], "o", label="true", c='b')

for i, alpha in enumerate([30, 10, 1, .01]):
    plt.plot(lasso_models[alpha].coef_[coefficient_sorting], "o", label="alpha = %.2f" % alpha, c=plt.cm.summer(i / 3.))
    
plt.legend(loc="best")


Out[15]:
<matplotlib.legend.Legend at 0x7f1ecf4a19b0>

Linear models for classification

y_pred = x_test[0] * coef_[0] + ... + x_test[n_features-1] * coef_[n_features-1] + intercept_ > 0

The influence of C in LinearSVC


In [16]:
from plots import plot_linear_svc_regularization
plot_linear_svc_regularization()


Multi-Class linear classification


In [17]:
from sklearn.datasets import make_blobs
plt.figure()
X, y = make_blobs(random_state=42)
plt.scatter(X[:, 0], X[:, 1], c=y)


Out[17]:
<matplotlib.collections.PathCollection at 0x7f1ece14fb38>

In [18]:
from sklearn.svm import LinearSVC
linear_svm = LinearSVC().fit(X, y)
print(linear_svm.coef_.shape)
print(linear_svm.intercept_.shape)


(3, 2)
(3,)

In [19]:
plt.scatter(X[:, 0], X[:, 1], c=y)
line = np.linspace(-15, 15)
for coef, intercept in zip(linear_svm.coef_, linear_svm.intercept_):
    plt.plot(line, -(line * coef[0] + intercept) / coef[1])
plt.ylim(-10, 15)
plt.xlim(-10, 8)


Out[19]:
(-10, 8)

Exercises

Load the bike rental and bank campaign datasets from the last excercise. For each dataset, try L1 penalized and L2 penalized linear models and plot the coefficients. Which features are impotant? Change the regularization parameter and check the changes in training set performance and test set performance.